Development and validation of a metabolite index for obstructive sleep apnea across race/ethnicities

Obstructive sleep apnea (OSA) is a common disorder characterized by recurrent episodes of upper airway obstruction during sleep resulting in oxygen desaturation and sleep fragmentation, and associated with increased risk of adverse health outcomes. Metabolites are being increasingly used for biomarker discovery and evaluation of disease processes and progression. Studying metabolomic associations with OSA in a diverse community-based cohort may provide insights into the pathophysiology of OSA. We aimed to develop and replicate a metabolite index for OSA and identify individual metabolites associated with OSA. We studied 219 metabolites and their associations with the apnea hypopnea index (AHI) and with moderate-severe OSA (AHI ≥ 15) in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) (n = 3507) using two methods: (1) association analysis of individual metabolites, and (2) least absolute shrinkage and selection operator (LASSO) regression to identify a subset of metabolites jointly associated with OSA, which was used to develop a metabolite index for OSA. Results were validated in the Multi-Ethnic Study of Atherosclerosis (MESA) (n = 475). When assessing the associations with individual metabolites, we identified seven metabolites significantly positively associated with OSA in HCHS/SOL (FDR p < 0.05), of which four associations—glutamate, oleoyl-linoleoyl-glycerol (18:1/18:2), linoleoyl-linoleoyl-glycerol (18:2/18:2) and phenylalanine, were replicated in MESA (one sided-p < 0.05). The OSA metabolite index, composed of 14 metabolites, was associated with a 50% increased risk for moderate-severe OSA (OR = 1.50 [95% CI 1.21–1.85] per 1 SD of OSA metabolite index, p < 0.001) in HCHS/SOL and 55% increased risk (OR = 1.55 [95% CI 1.10–2.20] per 1 SD of OSA metabolite index, p = 0.013) in MESA, both adjusted for demographics, lifestyle, and comorbidities. Similar albeit less significant associations were observed for AHI. Replication of the metabolite index in an independent multi-ethnic dataset demonstrates the robustness of metabolomic-based OSA index to population heterogeneity. Replicated metabolite associations may provide insights into OSA-related molecular and metabolic mechanisms.

The Multi-Ethnic Study of Atherosclerosis. MESA is a cohort study designed to study risk factors for clinical and subclinical cardiovascular diseases in four racial/ethnic groups 21 . The study began in July 2000 and recruited 6,814 adults free of clinical CVD and aged 45-84 years from 6 centers: Baltimore, MD; Chicago, IL; Los Angeles, CA; New York, NY; Saint Paul, MN; and Winston-Salem, NC. Participants were continued to be studied through subsequent follow-up exams. Of the 4077 participants who attended Exam 5 (2010Exam 5 ( -2012, 2261 participated in the MESA Sleep ancillary study (2010)(2011)(2012)(2013)) which occurred at a mean interval of 301 days (range 0-1, 024 days) after the MESA Exam 5. The sleep study was only conducted on participants who were not receiving sleep apnea treatment. As reported before 6 , participants in the Sleep Exam were generally similar to non-participants. OSA was assessed with Type II in-home polysomnography. AHI was defined as the total number of apnea and hypopneas with at least 30% reduction in the nasal flow signal and with ≥ 3% oxygen desaturation per hour of sleep. Metabolomic data were collected on 1,000 randomly selected participants from the fasting samples collected during the MESA Exam 5 core exam. Of these, 475 participants also had sleep measures and are included in this analysis. All MESA participants provided written informed consent, and the study was approved by the Institutional Review Boards at The Lundquist Institute (formerly Los Angeles BioMedical Research Institute) at Harbor-UCLA Medical Center, University of Washington, Wake Forest School of Medicine, Northwestern University, University of Minnesota, Columbia University, and Johns Hopkins University. All methods and analyses of MESA participants' materials and data were carried out in accordance with human subject research guidelines and regulations.
MESA metabolomic profiling. Metabolite profiling was performed using liquid chromatography tandem mass spectrometry (LC-MS). Positive ion mode profiling of water-soluble metabolites and lipids was performed using LC-MS systems comprised of Nexera X2 U-HPLC (Shimadzu Corp.; Marlborough, MA) units coupled to a Q Exactive mass spectrometer (Thermo Fisher Scientific; Waltham, MA). Polar metabolites were analyzed using hydrophilic interaction liquid chromatography (HILIC) and lipids were analyzed separately using reversed phase C8 chromatography as described in detail previously 22 . Raw data were processed using TraceFinder 3.1 (Thermo Fisher Scientific; Waltham, MA) and Progenesis QI (Nonlinear Dynamics; Newcastle upon Tyne, UK). To measure organic acids and other intermediary metabolites in negative ionization mode, chromatography was performed using an Agilent 1290 infinity LC system equipped with a Waters XBridge Amide column, coupled to an Agilent 6490 triple quadrupole mass spectrometer. Metabolite transitions were assayed using a dynamic multiple reaction monitoring system. LC-MS data were analyzed with Agilent Masshunter QQQ Quantitative analysis software. Isotope labeled internal standards were monitored in each sample to ensure proper MS sensitivity for quality control. Pooled plasma samples were interspersed at intervals of 10 participant samples for standardization of drift over time and between batches. Additionally, separate pooled plasma was interspersed at www.nature.com/scientificreports/ every 20 injections to determine coefficient of variation for each metabolite over the run. Peaks were manually reviewed in a blinded fashion to assess quality. For each method, metabolite identities were confirmed using authentic reference standards or reference samples. Metabolites with poor peak quality and coefficients of variation greater than 30% averaged across batches were removed from analysis.

Quality control of metabolites in HCHS/SOL and MESA.
Missing metabolite values were addressed as described in Supplementary Fig. S1. In our discovery sample (HCHS/SOL), we excluded individuals with more than 25% missing metabolite values, and excluded metabolites with missing values for 75% or more individuals. For metabolites with more than 25% and less than 75% missing values, values were dichotomized as "observed" and "unobserved". For metabolites with less than 25% missing values, we imputed the missing values using the minimum observed value of the metabolite in the sample, under the assumption that metabolites were not observed due to a technical detection limit. Because our study design included validation analysis, we focused on metabolites available in both HCHS/ SOL and MESA. Before any quality control (QC) methods were applied, 231 HCHS/SOL metabolites were mapped to 294 metabolites in MESA. The mapping of MESA to HCHS/SOL metabolites as well as to RefMet ID was done at the Clish Lab. MESA had multiple metabolites matched to a single HCHS/SOL metabolite in multiple instances because the same metabolite was measured via more than one platform used by MESA. In some cases, a single metabolite appears as two highly correlated ion features in the same MESA platform (e.g., some neutral lipids were measured as both sodium and ammonium adducts). Therefore, a single feature was mapped to the metabolite in HCHS/SOL while the redundant features were dropped according to the following principles: features with redundant ions were excluded; features with lower missingness and lower skewness were prioritized. After removing 60 such redundant features in MESA and applying QC methods based on the missingness in HCHS/SOL, 219 HCHS/SOL metabolites were mapped to 219 metabolites in MESA. Supplementary Table S1 provides the list of the 294 initially matched metabolites cross-referenced by RefMet ID and metabolite annotations including HMDB IDs provided by Metabolon, along with details regarding metabolite-specific QC resulting in the final list of one-to-one matched metabolites. The serum concentration values of the matched metabolites that were treated as continuous were rank-normalized.
Because MESA was a validation study, we only evaluated metabolites that were identified in the association analysis in HCHS/SOL. The missing data for these metabolites were always < 25%, so we treated these as continuous variables and imputed missing values with the minimum observed value in the MESA sample.
Statistical analysis. Association analyses were based on three conceptual regression models: Model 1 (i.e. primary model) adjusted for demographic variables -age, gender, study center, Hispanic background (Mexicans, Puerto Ricans, Cubans, Central Americans, Dominicans, and South Americans and other/multi), and body mass index (BMI) in HCHS/SOL; age, gender, study site (two sites with low sample sizes were combined), race (White versus "Non-White", which consists of Hispanic, Black and Chinese Americans), and BMI in MESA. Model 2 (i.e. lifestyle model) adjusted for demographic and lifestyle variables -alcohol use, cigarette use, total physical activity (MET-min/day), and diet (Alternative Healthy Eating Index 2010) in HCHS/SOL; alcohol use and cigarette use in MESA. Model 3 (i.e. lifestyle and comorbidity model) adjusted for demographic, lifestyle and comorbidity variables-indicators for diabetes, hypertension, fasting insulin, fasting glucose, HOMA-IR, HDL, LDL, total cholesterol, triglycerides, systolic blood pressure and diastolic blood pressure in HCHS/SOL; hypertension, fasting glucose, HDL, LDL, cholesterol, triglycerides, systolic blood pressure and diastolic blood pressure in MESA. We used complete data with respect to covariates, so that individuals with missing covariates were removed and models with more covariates usually had lower sample sizes.
Single Metabolite Association (SMA) analysis for OSA and AHI. We tested the associations of each of 219 metabolites (both continuous and dichotomized metabolites) with moderate-severe OSA and AHI in HCHS/SOL. Each metabolite was the exposure in either linear or logistic regression (depending on the outcome) for each model. We accounted for the HCHS/SOL study design (sampling and clustering) and obtained representative effect estimates using survey regression implemented in the R survey package (4.0) 23 . We controlled the false discovery rate (FDR) using the Benjamini-Hochberg procedure 24 and determined significant associations as those with FDR p-value < 0.05. We visualized the Spearman's correlations among significant metabolites in HCHS/SOL. In the replication analysis, we tested the associations of these metabolites with OSA in logistic regression and with AHI in linear regression in MESA in model 1-3. We computed one-sided p-values guided by the estimated direction of the SMA analysis results in HCHS/SOL 25 , and determined replication if the onesided p-value was < 0.05.
For the secondary analyses, gender-stratified analysis and interaction analysis between individual metabolite and gender were conducted to assess potential gender differences. We performed gender-stratified analysis regardless of interaction p-values because interaction models typically assume an additive difference in the metabolite association with the outcome between gender groups while all other model parameters are assumed identical. We also conducted the SMA analysis for OSA and AHI in the full set of 1136 metabolites from HCHS/ SOL to encourage hypothesis generation by the research community.

Metabolite indices construction and validation.
We applied a LASSO logistic regression with moderate to severe OSA versus no or mild OSA (for brevity "OSA versus no OSA"), and linear regression with log-transformed AHI as log(AHI + 1), adjusted for the covariates in model 1 in HCHS/SOL. We included 209 continuous metabolites (not including the 10 dichotomized metabolites). We selected the LASSO tuning parameter by maximizing the area-under-curve for OSA, and minimizing the prediction error for AHI, in a tenfold www.nature.com/scientificreports/ cross-validation. LASSO metabolite indices were calculated as a weighted sum of the (normalized) metabolite serum concentrations, with weights being the metabolite coefficients from the LASSO regression. For interpretability of association, we then standardized or "z-scored" the indices by subtracting the sample mean and then dividing the resulting number by the sample standard deviation. As secondary analyses, we constructed additional metabolite indices using the metabolites identified in the SMA analysis (FDR p < 0.05): (1) Using the effect size estimates from the SMA analysis as weights; (2) Fitting an un-penalized regression model using these metabolites and extracting the coefficients from the model as weights (SMA-GLM). The goal of the latter approach was to better account for correlations among metabolites. We then calculated the weighted sum of the normalized metabolites serum levels, which was then z-scored to generate two additional sets of metabolite indices.
To validate the metabolite index association with their outcomes, we constructed, when possible (i.e. when metabolites were identified in LASSO/SMA), the three types of metabolite indices in MESA by rank-normalizing the matched metabolites within MESA, and then summing them using the weights developed in HCHS/SOL. We then assessed the associations of the indices, z-scored using MESA-specific means and SDs, with their corresponding sleep traits (i.e. OSA, AHI) in model 1-3. In the secondary analyses we assessed (1) potential gender differences using gender-stratified LASSO, and, separately, gender-stratified association analysis for metabolite indices constructed based on combined genders; (2) potential temporal effect in MESA by adjusting for the time differences between the sleep study and the metabolite profiling; (3) potential racial/ethnic effects by limiting the study sample in MESA to Hispanics; (4) associations with other sleep disordered breathing phenotypes (i.e. the percentage of sleep time with oxyhemoglobin saturation below 90% (Pctlt90), minimum oxygen saturation, average oxygen saturation, and average respiratory event length). We also assessed the associations between metabolite indices quartiles and the corresponding sleep traits.
All analyses were done in R 3.6.3. The glmnet package (3.0) 26 in R was used for the LASSO logistic regression. R code for constructing metabolite indices using metabolites and weights developed in this work are provided in Supplementary File 1. Table 1 characterizes the HCHS/SOL analytic sample and target population.

Participant characteristics.
The HCHS/SOL cohort included 3,507 participants, with a mean age of 41.72 years (SD = 15.4), of whom 50.7% were female and 10.2% were classified with moderate or severe OSA (AHI ≥ 15). Participants with OSA were more likely to be males, had a higher BMI, and were less likely to be never smokers compared to those without OSA. Individuals with OSA were also more likely to have comorbidities: 60% had hypertension and 34.3% had diabetes, compared to 27.8% with hypertension and 16.9% with diabetes in those without OSA. Supplementary  Table S2 characterizes the 475 MESA participants with metabolomics, sleep, and required measured covariates from the validation dataset. Compared to HCHS/SOL, MESA participants were older (mean 68.45 years, SD = 9.33), with a higher proportion of females (56.2%). Reflecting their older age, more MESA participants had moderate or severe OSA (46.7%) compared to HCHS/SOL. Table 2 shows the odds ratios corresponding to 7 metabolites associated (FDR p < 0.05) with OSA in HCHS/SOL adjusted for age, gender, BMI, study site/center, race/ ethnicity). Figure 2 and Supplementary Table S3 show the lifestyle-adjusted model and comorbidity-adjusted model results for the 7 metabolites while results of all tested metabolites can be found in Supplementary  Table S4. Among the 7 mapped metabolites in MESA, 4 metabolite associations had one-sided p-values < 0.05: glutamate, phenylalanine, linoleoyl-linoleoyl-glycerol (18:2/18:2), and oleoyl-linoleoyl-glycerol (18:1/18:2), all of which were associated with increased risk for OSA (Fig. 2). These associations also had FDR p-value < 0.05 in MESA. No metabolite was associated with AHI after multiple testing correction in HCHS/SOL (Supplementary Table S5). Also, no metabolite associations were detected at the FDR p < 0.05 level in minimally adjusted gender-stratified analyses in HCHS/SOL (Supplementary Tables S6-S9). Results for interaction analysis between metabolites and gender for OSA and AHI are provided in Supplementary Tables S10 and S11, respectively.

Metabolite associations with OSA and AHI.
In the secondary analysis, the SMA analysis was performed in all metabolites assessed in HCHS/SOL-including 782 known and 354 unknown (unidentified) metabolites. Of these, 20 and 4 metabolites were found to be significantly associated with OSA and AHI (FDR P < 0.05), respectively (Supplementary Tables S12-S17).

LASSO regression for joint selection and estimation of metabolite associations with OSA and AHI in HCHS/SOL.
We used a LASSO regression to select a set of metabolites that jointly associated with sleep apnea traits in the HCHS/SOL. Among the 14 metabolites identified for OSA by LASSO (Fig. 3), there were one carbohydrate, one peptide, three amino acids, three lipids, three nucleotides, and three cofactors and vitamins. Biliverdin and serine were unique to the OSA metabolite index while the remaining 12 metabolites were shared between the OSA and AHI metabolite indices. A total of 41 metabolites were identified for AHI, among which 29 metabolites were unique to AHI metabolite index. Coefficients for all metabolites from LASSO trained in gender-combined and gender-stratified samples are provided in the Supplementary Information (Supplementary Table S18). The summation of selected metabolites and their beta coefficients from LASSO were then z-scored to generate the metabolite index (study mean and SD used in z-score is provided in Supplementary  Table S19).

Metabolite indices associations with OSA and AHI in HCHS/SOL and in independent validation
in MESA. We constructed OSA and AHI metabolite indices in both HCHS/SOL and MESA based on the weights from the LASSO regressions conducted in HCHS/SOL. www.nature.com/scientificreports/ www.nature.com/scientificreports/ ity covariates [OR: 2.63; 95% CI (1.14-6.14); p = 0.024] (see Fig. 4 and Supplementary Table S20). AHI metabolite index associations had higher p-values in MESA, compared to OSA metabolite index associations ( Table 3). The AHI metabolite index was only replicated in women, adjusted for demographic, lifestyle, and comorbidity covariates in MESA. Notably, both OSA metabolite index and AHI metabolite index associations with their respective phenotypes were stronger when evaluated in women compared to the overall sample, in both HCHS/ SOL and MESA. Results from the secondary analyses of gender-specific metabolite indices are also provided in Table 3. Only the female-specific metabolite indices were replicated in MESA in models 1 and 2, but their associations with sleep apnea traits in women were weaker than that of the metabolite indices trained on the full HCHS/SOL sample, both in terms of p-value and of estimated effect size.
When adjusted for the time interval between the sleep exam and the blood collection in MESA, the estimated effect sizes and p-values of the associations between OSA and metabolite index did not substantially change, in any model (Supplementary Table S21). When limiting the MESA study sample to only Hispanics, the associations were slightly attenuated except in model 3 ( Supplementary Fig. S2). The associations between the OSA LASSO metabolite index and other sleep disordered breathing phenotypes showed consistent directionality with its OSA association ( Supplementary Fig. S3) -positively associated with typical OSA severity measures (e.g. pctlt90) while inversely associated with minimum and average oxyhemoglobin saturation. There was no evidence of association of the metabolite index with average respiratory event length. In MESA, the OSA metabolite index was associated with average and minimum oxygen saturation but its associations with other phenotypes were weaker.
Two additional metabolite indices were constructed for OSA based on the 7 significant SMA discovery results. The SMA metabolite index constructed with coefficients extracted directly from the SMA analysis as weights did not show as strong associations with OSA in HCHS/SOL or MESA, while the SMA-GLM metabolite index with coefficients extracted from the joint unpenalized generalized linear regression using the 7 metabolites together showed comparable effect size estimates to the LASSO-based OSA metabolite index (Supplementary  Table S21, Supplementary Fig. S2). Parameters for constructing the SMA and the SMA-GLM metabolite indices (i.e. metabolite coefficients, mean and SD for z-scoring) are provided in Supplementary Table S22; the correlation matrix of the 7 metabolites is provided in Supplementary Fig. S4.

Discussion
In this paper, we leveraged metabolomics data from two large, diverse community-based cohorts to derive the first metabolite index for moderate to severe OSA, as well as to identify individual metabolites associated with this disorder. We studied 219 metabolites and their associations with OSA and AHI in the HCHS/SOL using two methods: (1) analysis of individual metabolites, and (2) LASSO to identify a subset of metabolites that jointly predicted OSA or AHI. Then, we studied the associations in an independent validation study, MESA. We used the results from LASSO to derive OSA and AHI metabolite indices. In MESA, the OSA metabolite index was significantly associated with its respective phenotype, moderate to severe OSA; e.g., individuals in the highest quartile for OSA metabolite index had a more than twofold increased odds of moderate to severe OSA-both in the derivation sample and in an independent sample that varied by ancestry, age, and OSA prevalence. Findings persisted after adjusting for multiple lifestyle and health covariates. In contrast, when modeling AHI as a continuous measure of sleep apnea, weaker associations were observed, except for the top quartiles among females. In the association analysis of individual metabolites, seven metabolites were associated with OSA in HCHS/SOL (FDR p < 0.05), of which four associations were replicated in MESA. www.nature.com/scientificreports/ We implemented two main approaches to study the metabolomic correlates of sleep apnea phenotypes: LASSO and individual-metabolite regression analysis. These approaches serve different purposes: single metabolite regression highlights individual metabolites associated with sleep apnea phenotypes without adjustment to www.nature.com/scientificreports/ other metabolites, while LASSO estimates the combined effect of multiple metabolites. A metabolite identified as associated with sleep apnea phenotypes individually may not be selected by the LASSO analysis (e.g. if a different metabolite correlated to it was selected by LASSO). Thus, it is not surprising that two of the replicated metabolites identified in single metabolite analysis were not selected by LASSO. Similarly, a specific metabolite may be selected by LASSO, but not by individual metabolite analysis due to adjustment for multiple testing, which is not done in the LASSO analysis. The OSA metabolite index, constructed based on the LASSO results, is a single index using multiple blood biomarkers which together reflect the biochemical differences in the blood of individuals with and without OSA. The OSA metabolite index also showed stronger association with OSA than any single metabolite, in both the discovery and validation study, consistent with the influence of multiple metabolites in OSA pathophysiology (see Table 3, Supplementary Table S3). When comparing the three metabolite indices, the SMA metabolite index did not perform as well, likely due to correlations among the metabolites ( Supplementary Fig. S4), while the SMA-GLM metabolite index is comparable to the LASSO-based metabolite index (Supplementary Table S3). While future work is needed to study whether the metabolite index can be used in the clinic for screening or clinical management of patients with OSA, the study results supports its statistical and epidemiological utility given that the metabolite index had higher statistical power than analyses testing single metabolite associations. These indices, providing biologically relevant and novel associations with OSA, may enable additional studies of the pathophysiology of OSA and its relationship with other cardiometabolic conditions. However, additional research is needed to understand the best analytical strategy (i.e., one step approach like LASSO or two-step approach like SMA-GLM). Four metabolites were replicated in the single metabolite regression: glutamate, oleoyl-linoleoyl-glycerol (18:1/18:2) (DAG (36:3)), linoleoyl-linoleoyl-glycerol (18:2/18:2) (DAG(36:4)) and phenylalanine, among which glutamate and phenylalanine remained positively associated with OSA after adjusting for lifestyle and comorbidities in addition to basic demographics. Both metabolites have some previous evidence linking them to OSA or other sleep disorders, as well as cardiometabolic diseases, and suggest that elevations in glutamate and phenylalanine can be investigated as biomarkers for adverse outcomes in patients with OSA. High plasma glutamate has been associated with OSA-related health factors, including total and visceral adiposity, dyslipidemia and insulin resistance 27 , as well as increased risks for incident cardiovascular disease 28 , type 2 diabetes 29 , and subclinical atherosclerosis 30 , independent of established cardiovascular risk factors. The positive association between glutamate and OSA observed in our study may suggest a shared metabolomic profile for OSA and other . Odds ratio of metabolite index for OSA by quartiles in HCHS/SOL and MESA. *indicates p < 0.05. **indicates p < 0.01. ***indicates p < 0.001 In HCHS/SOL: Model 1 adjusted for age, gender, center, background, and BMI. Model 2 adjusted for age, gender, center, background, BMI, alcohol use, smoking status, physical activity and diet (AHEI 2010). Model 3 adjusted for age, gender, center, background, BMI, alcohol use, smoking status, physical activity, diet, T2DM, hypertension, fasting glucose, fasting insulin, HOMA IR, HDL, LDL, total cholesterol, triglycerides, systolic blood pressure and diastolic blood pressure. In MESA: Model 1 adjusted for age, gender, BMI, study site (site WFU and UCLA are combined due to low cell count), and race. Model 2 adjusted for age, gender, BMI, study site, race, alcohol use and smoking status. Model 3 adjusted for age, gender, BMI, study site, race, alcohol use, smoking status, hypertension indicator, fasting glucose, HDL, LDL, cholesterol, triglycerides, systolic blood pressure and diastolic blood pressure. www.nature.com/scientificreports/ cardiometabolic phenotypes. The sources for elevated glutamate are unclear but previous studies in rats animals and human showed that more frequent apneas during sleep led to an increased level of glutamate in brain [31][32][33] . Glutamate is the major excitatory neurotransmitter in the brain, and modulates brain energy metabolism and neuronal synaptic plasticity. Although the blood-brain barrier prevents plasma glutamate from freely permeating into the central nervous system 34 , when glutamate increases in the brain, the brain-to-blood glutamate efflux also increases, as suggested by the correlation between the peripheral glutamate and the central nerve system glutamate levels 35 . These findings support further research addressing the roles of peripheral and central glutamate in the pathophysiology of OSA. Phenylalanine is an essential aromatic amino acid that plays a key role in the biosynthesis of other amino acids, including the neurotransmitters, dopamine and norepinephrine. Studies have shown that plasma phenylalanine level can be elevated due to inflammation 36,37 , which is a common finding in OSA 38 . One mechanism for the increased levels of phenylalanine may be through chronic hypoxia, which has been reported to increase both systemic and cerebral delivery of phenylalanine 39 . This is in line with our evidence: peripheral phenylalanine was elevated among individuals with moderate to severe OSA (Supplementary Table S3). A prior lab-based study that measured a few metabolites over the course of sleep reported that phenylalanine levels decreased less overnight among patients with OSA compared to controls 40 . Phenylalanine levels were also reported to be elevated after sleep restriction 41 . The downstream effects of phenylalanine have been studied more widely in other chronic conditions, with reports of associations with elevated pro-inflammatory cytokines, suppressed immunity and increased mortality among heart failure patients 42 , and with more rapid telomere shortening consistent with accelerated aging 43 . Recent studies reported elevations of phenylalanine associated with chronic obstructive lung disease severity 44 , which was postulated to reflect muscle breakdown and respiratory muscle insufficiency 45 . Elevated plasma phenylalanine was shown to be a strong predictor for cardiovascular risk 46 , and a biomarker, mediator and potentially therapeutic target for pulmonary hypertension 47 . Further research on the association of OSA and phenylalanine may further identify the roles of hypoxia, inflammation, and muscle function in the pathophysiology of OSA and cardiometabolic conditions.
Our study also demonstrated increased plasma levels of two diacylglycerols (DAGs): DAG(36:3) and DAG(36:4) occurred in association with moderate to severe OSA. Altered lipids metabolism is often observed among patients with OSA [48][49][50] ; specifically, intermittent hypoxemia can stimulate lipolysis, increasing free fatty acid levels 51 . Abnormalities in lipid metabolism may result in liver and skeletal muscle fat deposition, exacerbating OSA through inflammatory or muscle-related pathways 52 . Therefore, the associations with these diacylglycerols may reflect mechanisms by which OSA related hypoxemia alters fatty acid metabolism. These associations, however, did not replicate in the validation study once adjusted for comorbidities, suggesting that the associations might be confounded by cardiometabolic conditions that often accompany OSA.
Estimated OSA associations, in both LASSO and single metabolite analysis, were generally stronger than AHI associations. Potential reasons are the variability of AHI across its continuum (with high night to night variability noted at low to mildly elevated levels) and the potential non-linear metabolomic associations with AHI not easily modeled in these analyses. Notably, non-linearity (i.e. a threshold effect) was previously shown for AHI association with hypoxemia and sympathetic nervous system activation burden 53 .
Gender differences have been increasingly reported among individuals with OSA 17 . Population-based studies have shown that prevalence of OSA is higher among men than women, particularly in pre-menopausal women 54 . However, some data indicate that metabolic syndrome and cardiovascular conditions are more strongly associated with OSA among female patients compared to males 55,56 . Indeed, when assessing the metabolite indices developed in both genders and tested for their associations with OSA and AHI, we observed stronger associations among women than men in the MESA validation dataset (Table 3). However, compared to the metabolite indices developed in combined gender strata, gender-specific metabolite indices had weaker associations with the OSA/AHI in the validation data set (weaker effect size estimates and higher p-value), which may be the result of lower statistical power when using a smaller dataset for discovery or differences in the age distributions of the two samples.
In addition to pointing to novel individual metabolites that play a role in OSA, our metabolite indices also showed moderately strong associations with OSA in an external sample, despite the marked differences in race/ ethnicity, age, and OSA prevalence compared to the discovery sample. We found no evidence to support the superiority of the index in a sample restricted to Hispanic individuals (Supplementary Fig. S2 and Supplementary  Table S21). This supports the overall generalizability of the metabolite index across diverse populations. Nonetheless, the utility of a twofold increased risk of OSA among individuals in the highest metabolite index quartile in helping to screen or triage patients for more comprehensive testing will need to be formally evaluated, potentially combining metabolite data with other information, such as OSA-related symptoms, to improve screening.
A strength of our study is that our population-based sample is more than tenfold larger than prior studies 57 , includes a high proportion of ethnic/racial minorities who have been under-represented in research but are at increased for adverse health outcomes, and is more representative of samples in the general population who remain include large numbers of undiagnosed individuals. We used rigorous statistical methods, adjusted for a large number of lifestyle and health covariates, and were able to replicate the main findings despite large differences in our discovery and validation populations, which suggests relatively strong associations and generalizability of the metabolite associations with OSA.
There are several limitations in this study. The temporal relationship between the blood sample collection and sleep test was concurrent in HCHS/SOL and on average one year apart in MESA, allowing for cross-sectional associations, but limiting our ability to discern causal pathways. The associations between OSA and the metabolite index did not substantively change even after controlling for the time interval between blood draw and sleep test in MESA, suggesting the metabolite profiling for OSA is stable (Supplementary Tables S19, S3). Although over 1000 metabolites were quantified in both populations, less than 300 metabolites were matched between the www.nature.com/scientificreports/ two platforms (after quality control only 219 distinct metabolites were mapped). We limited our study to only the matched metabolites to allow for replication testing, which strengthens the results and conclusions (results on full set of metabolites are provided in Supplementary Tables S12-S17). MESA metabolomic profiling was conducted using three complementary platforms measuring several broad classes of small molecules therefore multiple chemical compounds from MESA were mapped to the same metabolite in the HCHS/SOL. We chose a single feature to map to any HCHS/SOL metabolite based on a set of rules related presence of redundant ions, data missingness and skewness. In the future, other more optimal approaches may be proposed and studied. Some associations failed to replicate in MESA, potentially due to heterogeneity in different populations and low power in MESA, which had a small sample size. Finally, the definitions of AHI differed slightly in the two studies: while the 3% oxyhemoglobin desaturation criterion applied to hypopneas only in MESA, due to differences in the recording montage, a 3% desaturation criterion was applied for all respiratory events in HCHS/SOL. It is important to consider how to use, interpret, and transfer metabolite indices that combine multiple metabolites across studies. First, different from typical biomarkers, values of a metabolite index (as constructed in this work) does not have any absolute reference, due to the standardization of the individual metabolite concentrations and of the final weighted index. Thus, metabolite index of 0 implies the population average. Further, metabolites are summed with potentially both negative and positive weights, accounting for the fact that some are positively, and some are negatively associated with increased OSA risks. The final weighted sum is always associated with increased OSA risks. Second, it is worth noting that the metabolite indices sum metabolites that were rank-normalized within studies. If targeted metabolomics analyses were used, it may have been more appropriate to not rank-normalize or to apply standardization in a different way. More work is needed to learn how to transfer metabolite indices constructed based on untargeted metabolomics survey to targeted measurements. Finally, we reported metabolite indices associations in HCHS/SOL and in MESA while standardizing the indices in each study separately. As the metabolite indices are sums of normalized metabolites in the sample, their mean is close to zero, but may be slighted different than zero based on the specific sample used. Their variance depends not only on the variance of each (normalized) metabolite, but also on the correlation between metabolites in the sample, which may cause differences in scale of metabolite indices before z-scoring between studies. While z-scoring metabolite indices does not affect the strength of associations, it does affect the estimated effect sizes. Reporting of associations and risk by in-sample quartiles, in contrast, does not depend on scales and z-scoring, but also suffers from limited transferability, as one person who may be in a low quartile in one set of samples may be in a higher quartile in a different set of samples. More work is needed to develop a framework for transferability of metabolite indices across studies.
In summary, we used two large datasets of population-based multi-ethnic cohort studies to study metabolomics associations with OSA. We developed metabolite indices that replicated across datasets, and had a statistically significant association with OSA even after adjusting for multiple lifestyle factors and cardiometabolic comorbidities. In future work we will study the possibility of developing an OSA screening tool based on metabolite indices. Four metabolites also replicated in an independent dataset, of which one was previously implicated in OSA, and two were previously connected to sleep disorders. Collectively, our findings support the utility of metabolomic profiling for generating metabolite indices of sleep apnea in racially and ethnically diverse populations, and its potential to provide insight into the pathophysiology of OSA.

Data availability
In accordance with participants informed consent, MESA and HCHS/SOL data are available through data use agreement in dbGaP according to the study specific accessions. MESA phenotypes are available in: phs000209; MESA metabolomics data have been deposited and will become available in: phs001416; and HCHS/SOL phenotypes: phs000810. HCHS/SOL metabolomics data are available via data use agreement with the HCHS/SOL Data Coordinating Center at the University of North Carolina at Chapel Hill, see collaborators website: https:// sites. cscc. unc. edu/ hchs/. All data generated or analyzed during this study are included in this published article and its supplementary information file. In addition, summary statistics from single metabolite association analyses in MESA and HCHS/SOL for all studied metabolites are provided as Supplementary Information.